In the previous document, we apply Surrogate Variable Analysis on the mean heterogeneity data. In our toy examples, there are 80 samples. The samples belong to two classes. First 40 samples belong to Class 1 and others belong to Class 2. The samples are collected from two batches: Batch1 and Batch2.
Each sample has 100 features:
\(R\) is the residual matrix gained by regressing the data matrix \(X\) from the primary variable (class label vector) \(Y\). Since the primary variables effects are constant, \(X_1\) in the residual matrix are 0. Then the situation is equivalent to detect the mean heterogeneity from the Gaussian random noise matrix.
The simulation results indicate that SVA performs poorly on detecting the surrogate variable in our toy examples. The major problem in their algorithm is that they form the new matrix \(R^{*}\) by pemuting each row of \(R\) independently to remove any structure in the matrix. However, this operation can not indeed break down the Gaussian mixture structure. In fact, in the \(R\), \(X_3 \dots X_100\) are Gaussian random noises which are spherical symmetric. Then permuting within these rows will not change the structure of matrix. Permuting \(X_2\) will also preserve the Gaussian Mixture structure. Therefore, after permuting the matrix within each row independently, the whole matrix still has a Gaussian mixture. In our toy examples, a more efficent way to detect the surrogate variable is permuting each columns independently.
In this case, \(\pi_1 = \pi_2 = 1\). It means that all the samples come from the Batch1. Therefore there is no batch effect in the data. This case is set as a baseline for understanding the behavior of eigenvalues in the analysis.
## [1] "The number of surrogate variable: 0"
## [1] "Mannually setting the number of surrogate variable as 1 to apply the sva algorithm"
## Number of significant surrogate variables is: 1
## Iteration (out of 5 ):1 2 3 4 5
In this case, \(\pi_1 = \pi_2 = 0.5\). It indicates the balance in the following two ways:
## [1] "The number of surrogate variable: 0"
## [1] "Mannually setting the number of surrogate variable as 1 to apply the sva algorithm"
## Number of significant surrogate variables is: 1
## Iteration (out of 5 ):1 2 3 4 5
In order to better understand the behavior of SVA against heterogeneity, we design three cases for unbalanced batch effects.
In this case, it indicates:
By the symmetric of Batch 1 and Batch 2, we only need to consider the case \(\pi_1 = \pi_2, \pi_1 + \pi_2 < 1\). We do simulation for two sets of \(\pi_1\) and \(\pi_2\):
## [1] "The number of surrogate variable: 0"
## [1] "Mannually setting the number of surrogate variable as 1 to apply the sva algorithm"
## Number of significant surrogate variables is: 1
## Iteration (out of 5 ):1 2 3 4 5
## [1] "The number of surrogate variable: 0"
## [1] "Mannually setting the number of surrogate variable as 1 to apply the sva algorithm"
## Number of significant surrogate variables is: 1
## Iteration (out of 5 ):1 2 3 4 5
In this case, it indicates:
By the symmetric of Batch 1 and Batch 2, we only need to consider the case \(\pi_1 > \pi_2, \pi_1 + \pi_2 = 1\). We do simulation for two sets of \(\pi_1\) and \(\pi_2\):
## [1] "The number of surrogate variable: 0"
## [1] "Mannually setting the number of surrogate variable as 1 to apply the sva algorithm"
## Number of significant surrogate variables is: 1
## Iteration (out of 5 ):1 2 3 4 5
## [1] "The number of surrogate variable: 0"
## [1] "Mannually setting the number of surrogate variable as 1 to apply the sva algorithm"
## Number of significant surrogate variables is: 1
## Iteration (out of 5 ):1 2 3 4 5
In this case, it indicates:
By the symmetric of Batch 1 and Batch 2, we only need to consider the case \(\pi_1 > \pi_2, \pi_1 + \pi_2 < 1\). We do simulation for two sets of \(\pi_1\) and \(\pi_2\):
## [1] "The number of surrogate variable: 0"
## [1] "Mannually setting the number of surrogate variable as 1 to apply the sva algorithm"
## Number of significant surrogate variables is: 1
## Iteration (out of 5 ):1 2 3 4 5
## [1] "The number of surrogate variable: 0"
## [1] "Mannually setting the number of surrogate variable as 1 to apply the sva algorithm"
## Number of significant surrogate variables is: 1
## Iteration (out of 5 ):1 2 3 4 5